This report details an analysis of the GapMinder data, containing various economic statistics for countries across the world, from 1962 to 2007.
There was a distinct difference in energy use between continents, as there was a strong link between GDP Per Capita and energy usage (Graph 1). Continents with a lot of developed nations such as Europe and Oceania consistently had higher energy usage through the period 1962-2007. Energy usage also increased for most continents over time.
In the graph below, energy user per continent over time shows various trends. One such trend is the difference between the continents with predominantly developed nations (Oceania and Europe), and those consisting of predominantly those of developing countries (Africa). The large drop from 1962 to 1972 for Americas and Asia is due to missing data for developing countries, only rectified in the 1972 data. This leads to the question of suprisingly low median values for Asia and Americas, which includes Japan, Canada and the United States. As these developed economies have much higher energy usages, they are effectively outliers when grouped with their geographical neighbors. So GDP is the hidden factor here, reducing the value of continents as a discriminatory variable.
To further understand the impacts of GDP on energy use, a sample from a single year (2007) was analyzed. Unremarkably, the relationship between energy use and GDP was positive, and also linear, as demonstrated by the regression line in the graph below. This explains many of the features of energy use by continent shown in Graph 1.
Imports were compared between European and Asian imports were compared from 1992 to 2007. There are four sets of data, spaced 5 years apart. Whilst there is some variation in exports from year to year, the differences where not significant.
More details on the analysis process are detailed in Appendix 2.
The country with the highest mean ranking for population density between 1962 and 2007 is the Macao Special Administrative Region (SAR), followed by Monaco and Hong Kong SAR.
The top 5 nations are detailed in the table below.| Country | Average Population Density |
|---|---|
| Macao SAR, China | 14732 |
| Monaco | 14090 |
| Hong Kong SAR, China | 5153 |
| Singapore | 4361 |
| Gibraltar | 2622 |
The greatest absolute increase in life expectancy occurred in the Maldives, with a 37 year increase between 1962 and 2007. As a percentage, however, Bhutan’s life expectancy increased 100%, from 33 to 66 years.
| Country | 1962 | 2007 | Life Expectancy Increase in Years |
|---|---|---|---|
| Maldives | 38.5 | 75.4 | 36.9 |
| Bhutan | 33.1 | 66.3 | 33.2 |
| Timor-Leste | 34.7 | 65.8 | 31.1 |
| Tunisia | 43.3 | 74.2 | 30.9 |
| Oman | 44.3 | 75.1 | 30.8 |
| Country | 1962 | 2007 | Life Expectancy % Increase |
|---|---|---|---|
| Bhutan | 33.1 | 66.3 | 100.3 |
| Maldives | 38.5 | 75.4 | 95.9 |
| Mali | 28.5 | 54.3 | 90.1 |
| Timor-Leste | 34.7 | 65.8 | 89.5 |
| Nepal | 36.0 | 66.6 | 85.1 |
From 1962 to 1967, both Americas and the Asian continents data only included developed nations such as USA, Canada and Japan, leading to very high median energy usages. From 1972, less developed nations energy usage data was added, leading to large drops in median energy usage for Asia and the Americas.
A review of the energy use by continent revealed that several continents contained developed nations from the G12, all of which have much higher energy usage than developing nations. These nations represented outliers, making the median a better measure for all summary statistics.
The graphs below demonstrate the issue of developed and developing countries in Asia and the Americas for a single year.
The data was split by year into imports from Asia and Europe. For each year, tests of variance and normality were carried out. the data sets for each year where found to be of non-normal distribution and heteroscedastic.
Attempts at data transformation did not have an effect on distribution.
For these reasons, the non-parametric Wilcoxon Rank Sum Test was used to assess if there was a significant difference in imports between European and Asian continents.
The following code was used on a subset of the the data (eurasia), which only contained data from Europe and Asia, test variance across all years. Both Asian and European data sets across all years were found to heteroscedastic.
for(Y in seq(from=1992, to=2007, by=5)){
asia <- eurasia %>% filter(continent=="Asia", Year==Y) %>% select(imports)
europe <- eurasia %>% filter(continent=="Europe", Year==Y) %>% select(imports)
bt <- var.test(asia$imports, europe$imports)
if (bt$p.value < 0.05){
print(paste("Year ", Y, " - Samples Have Different Variance p=",bt$p.value))
}else{
print(paste("Year ", Y, " - Samples Have Same Variance p=", bt$p.value))
}
}
## [1] "Year 1992 - Samples Have Different Variance p= 0.00177440987783983"
## [1] "Year 1997 - Samples Have Different Variance p= 8.81431299264435e-05"
## [1] "Year 2002 - Samples Have Different Variance p= 3.04778763520197e-05"
## [1] "Year 2007 - Samples Have Different Variance p= 0.000354648159186954"
After the comparison of variances, the samples were assessed to see if they had a normal distribution.The following code was used on a subset of the data (eurasia), which only contained data from Europe and Asia. The data was found to have distributions which were NOT normal.
for(Y in seq(from=1992, to=2007, by=5)){
asia <- eurasia %>% filter(continent=="Asia", Year==Y) %>% select(imports)
europe <- eurasia %>% filter(continent=="Europe", Year==Y) %>% select(imports)
shap_asia <- shapiro.test(asia$imports)
shap_europe <- shapiro.test(europe$imports)
if (shap_asia$p.value < 0.05){
print(paste("Year ", Y, " - Asia Sample is not Normal Dist p=", shap_asia$p.value))
}else{
print(paste("Year ", Y, " - Assia sample has Normal Dist p=", shap_asia$p.value))
}
if (shap_europe$p.value < 0.05){
print(paste("Year ", Y, " - Europe sample is Not Normal Dist p=", shap_europe$p.value))
}else{
print(paste("Year ", Y, " - Europe sample has Normal Dist p=", shap_europe$p.value))
}
}
## [1] "Year 1992 - Asia Sample is not Normal Dist p= 0.00692754055163552"
## [1] "Year 1992 - Europe sample is Not Normal Dist p= 0.00159530937109208"
## [1] "Year 1997 - Asia Sample is not Normal Dist p= 0.0014279242621728"
## [1] "Year 1997 - Europe sample is Not Normal Dist p= 0.0185145480399587"
## [1] "Year 2002 - Asia Sample is not Normal Dist p= 0.000865516067436733"
## [1] "Year 2002 - Europe sample is Not Normal Dist p= 0.0270614166355554"
## [1] "Year 2007 - Asia Sample is not Normal Dist p= 0.000526159274481657"
## [1] "Year 2007 - Europe sample has Normal Dist p= 0.0757556476330589"
Since the data is heteroscedastic, with a non-Normal distribution, the Wilcoxon Rank Sum test was used to confirm or exclude any significant difference in exports for each year of data collected.
The following code was used on a subset of the data (eurasia), which only contained data from Europe and Asia.
The result for all years was that there was NO significant difference in exports between Asia and Europe.
for(Y in seq(from=1992, to=2007, by=5)) {
asia <- eurasia %>% filter(continent=="Asia", Year==Y) %>% select(imports)
europe <- eurasia %>% filter(continent=="Europe", Year==Y) %>% select(imports)
wt <- wilcox.test(asia$imports, europe$imports)
print(paste("Year ", Y, " - Wilcoxon Rank Sum. p=",wt$p.value))
}
## [1] "Year 1992 - Wilcoxon Rank Sum. p= 0.481477146310066"
## [1] "Year 1997 - Wilcoxon Rank Sum. p= 0.334321291832349"
## [1] "Year 2002 - Wilcoxon Rank Sum. p= 0.869442490313876"
## [1] "Year 2007 - Wilcoxon Rank Sum. p= 0.406469388988677"